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Abstract 

Gaussian network model (GNM) is one of the most accurate and efficient methods for biomolecular flexibility analysis. 
However, the systematic generalization of the GNM has been elusive. We show that the GNM Kirchhoff matrix can 
be built from the ideal low-pass filter, which is a special case of a wide class of correlation functions underpinning 
the linear scaling flexibility-rigidity index (FRI) method. Based on the mathematical structure of correlation functions, 
we propose a unified framework to construct generalized Kirchhoff matrices whose matrix inverse leads to correlation 
function based GNMs, whereas, the direct inverse of the diagonal elements gives rise to FRI method. We illustrate 
that correlation function based GNMs outperform the original GNM in the B-factor prediction of a set of 364 proteins. 
We demonstrate that for any given correlation function, FRI and GNM methods provide essentially identical B-factor 
predictions when the scale value in the correlation function is sufficiently large. 
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Under physiological condition, proteins undergo everlast¬ 
ing motions, ranging from atomic thermal fluctuation, side- 
chain rotation, residue swiveling, to domain swirling, protein 
motion strongly correlates with protein functions, including 
molecular docking,^ drug binding,^ allosteric signaling,^ self 
assembly^^ and enzyme catalysis.The range of protein 
motions in a cellular environment depends on the structure’s 
local flexibility, an intrinsic property of a given protein struc¬ 
ture. Protein flexibility is reflected by the Debye-Waller or B- 
factor, i.e., the atomic mean-square displacement, obtained 
in structure determination by x-ray crystallography, NMR, or 
single-molecule force experiments.^ However, the B-factor 
cannot absolutely quantify flexibility: it also depends the 
crystal environment, solvent type, data collection condition 
and structural refinement procedure. 

The flexibility of a biomolecule can be assessed by nor¬ 
mal mode analysis (NMA),^’elastic network model 
(ENM),^^’^^ Gaussian network model (GNM),^’"^ anisotropic 
network model (ANM),^ graph theory,etc. NMA can be 
regarded as time-independent molecular dynamics (MD).^° 
NMA diagonalizes the MD potential to obtain a set of eigen¬ 
values and eigenvectors, where first few eigenvectors predict 
the collective, global motions, which are potentially relevant 
to biomolecular functionality. NMA with only the elasticity 
potential, which is a leading term in the MD potential, was 
introduced by Tirion,^^ and was extended to the network 
setting in ANM.^ Here network refers to the connectivity 
between particles regardless of their chemical bonds.^ The 
GNM is a highly accurate network based flexibility method. 
Although it was originally advocated as an ENM^ and in¬ 
terpreted with the random Gaussian network theory,^ the 
GNM is strictly not an elastic modeP^— it utilizes a Kirch- 
hoff matrix, rather than the harmonic potential of elasticity. 
Additionally, its computational procedure does not directly 
invoke the random Gaussian network theory. Due to the 
lack of in-depth understanding, there is no rigorous anal¬ 
ysis and/or systematic generalization of the GNM in the 
literature. Therefore, there is a pressing need to better un¬ 
derstand the working principle of the GNM theory, which is 
crucial for its further improvement. 

A common feature of the aforementioned approaches is 
that, they all depend on the mode decomposition of the po¬ 
tential matrix, which typically has the computational com¬ 
plexity of 0{N^), where N is the number of elements in the 
potential matrix. Yang et al.^^ demonstrated that due to its 
network setting, the GNM is about one order more efficient 
than most other flexibility approaches. 

Recently, we have proposed a few mode-decomposition 
free methods for flexibility analysis, i.e., molecular nonlinear 
dynamics,stochastic dynamics^^ and flexibility-rigidity in¬ 
dex (FRI).^^’^^ Among them, the FRI is of 0{N‘^) in com¬ 
putational complexity and has been accelerated to 0{N) 
without loss of accuracy.The essential idea of the FRI 
method is to evaluate the rigidity index or the compactness 
of the biomolecular (network) packing by the total correla¬ 
tion. Then the flexibility index is defined as the inverse of 
the rigidity index. The correlation between any two atoms or 


residues is measured through correlation functions. The FRI 
can be regarded as a generalization of Halle’s local density 
model.The FRI method has been shown to be orders of 
magnitude more efficient and about ten percent more accu¬ 
rate than the GNM for the B-factor prediction of a set of 
365 proteins. 

The objective of the present work is to shed light on the 
GNM and FRI methods. Specifically, we reveal that the 
ideal low filter used in the GNM Kirchhoff matrix is a spe¬ 
cial admissible FRI correlation function, which is the limit¬ 
ing case of many commonly used FRI correction functions. 
This finding paves the way for understanding the connec¬ 
tion between the GNM and FRI methods. Additionally, we 
introduce a generalized Kirchhoff matrix to provide a uni¬ 
fied starting point for the GNM and FRI methods, which 
throws light on the similarity and difference between GNM 
and FRI. Moreover, based on this new understanding of the 
GNM working principle, we propose infinitely many correla¬ 
tion function based GNM methods. Finally, we unveil that 
the FRI and the GNM are asymptotically equivalent when 
the cutoff value in the Kirchhoff matrix or the scale value 
in the correlation function is sufficiently large. The present 
work offers a new strategy for the design and construction 
of accurate, efficient and robust methods for biomolecular 
flexibility analysis. 

To establish notation and facilitate new development, let 
us present a brief review of the GNM and FRI methods. 
Consider an A/'-particle coarse-grained representation of a 
biomolecule. We denote {r^|r^ G = 1 , 2 ,-•• , A^} the 
coordinates of these particles and rij = Ijr^ — Tj\\ the Eu¬ 
clidean space distance between ith and jth particles. In 
a nutshell, the GNM prediction of the ith B-factor of the 
biomolecule can be expressed as^’^ 

= = ,N, (1) 

where a is a fitting parameter that can be related to the 
thermal energy and (T”^).. is the ith diagonal element of 
the matrix inverse of the Kirchhoff matrix. 
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where Tc is a cutoff distance. The GNM theory evaluates the 
matrix inverse by where T 

is the transpose and and are the kth eigenvalue and 
eigenvector of T, respectively. The summation omits the 
first eignmode whose eigenvalue is zero. 

The FRI prediction of the ith B-factor of the biomolecule 
can be given by^^’^^ 


B: 
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where a and b are fitting parameters, /, = - ^-7 -r 

is the ith flexibility index and 's the 
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Figure 1: Illustration of admissible correlation functions, (a) Correla¬ 
tion functions approach the ILF as /c — oo or oo at 77 = TA. 

(b) Effects of varying scale value 77 . Local correlation is obtained with 
large v and small 77 values. Whereas, nonlocal correlation is generated 
by small v and large 77 values. 

ith rigidity index. Here, Wj is an atomic number depended 
weight function that can be set to Wj = 1 for a net¬ 
work, and ^{rij;r]) is a real-valued monotonically decreas¬ 
ing correlation function satisfying the following admissibility 
conditions 


Mathematically, the ILF is a special real-valued monoton¬ 
ically decreasing correlation function and also satisfies ad¬ 
missibility conditions (4) and (5). In fact, all FRI correlation 
functions are low-pass filters as well. Therefore, both GNM 
and FRI admit low-pass filters in their constructions. In¬ 
deed, GNM is very special in the sense that there is only 
one unique ILF, while, there are infinitely many other low- 
pass filters. Figure 1 illustrates the behavior and relation of 
the above low-pass filters or correlation functions. Clearly, 
the ILF is completely localized for any given cutoff value. 
In general, generalized exponential function and generalized 
Lorentz function are delocalized and the former decays faster 
than the latter for a given power. The combination of a low 
power value and a large scale gives rise to nonlocal correla¬ 
tions. Our earlier test indicates that v = 3 and rj = SA pro¬ 
vides a good flexibility analysis for a set of 365 proteins.^^ 
To further bring to light the mathematical foundation of 
the GNM and FRI methods, we consider a generalized Kirch- 
hoff matrix^^-^^ 
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where is a scale parameter. Delta sequences of the positive 
type^^ are good choices. Many radial basis functions are also 
admissible.Commonly used FRI correlation functions 
include the generalized exponential functions 

^{vij^T], k) = , K>0 (6) 

and generalized Lorentz functions 

A major advantage of the FRI method is that it does not 
resort to mode decomposition and its computational com¬ 
plexity can be reduced to 0(N) by means of the cell lists 
algorithm used in our fast FRI (fFRI).^^ In contrast, the 
mode decomposition of NMA and GNM has the computa¬ 
tional complexity of O(N^). 

To further explore the theoretical foundation of GNM, let 
us examine the parameter limits of generalized exponential 
functions (6) and generalized Lorentz functions (7) 

. , / / xt; ^ ^{rij;rc) as ^ oo, (9) 

1 + inj/v) 

where Vc = r] and ^{vij^Vc) is the ideal low-pass filter (ILF) 
used in the GNM Kirchhoff matrix 


^ injure) 


1 , nj < Tc 
0 , Tij > Vc 
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Relations (8) and (9) unequivocally connect FRI correlation 
functions to the GNM Kirchhoff matrix. It is important to 
examine whether the ILF is still a FRI correlation function. 


where ^{rij;r]) is an admissible FRI correlation function. 
The generalized Kirchhoff matrix includes the Kirchhoff ma¬ 
trix as a special case. It is important to note that each diag¬ 
onal element is a FRI rigidity index: jii = Therefore, 

the generalized Kirchhoff matrix provides a unified starting 
point for both the FRI and GNM methods. However, the 
striking difference between the GNM and FRI methods is 
that to predict B-factors, the GNM seeks a matrix inverse of 
the Kirchhoff matrix (2), whereas, the FRI takes the direct 
inverse of the diagonal elements of the generalized Kirchhoff 
matrix (11). 

Based on the above analysis, it is straightforward to con¬ 
struct correlation function based GNMs via the matrix in¬ 
verse of the generalized Kirchhoff matrix (11), which leads 
to infinitely many new GNMs, including the original GNM 
as a special limiting case. It is also possible to contruct to 
construct the FRI by using the Kirchhoff matrix, which gives 
rise to a unique FRI. Question arise as what are the relative 
performance of these correlation function based GNM and 
FRI methods. Another question is whether there is any fur¬ 
ther relation between these two distinguished approaches. 
Specifically, what is the relation between the diagonal ele¬ 
ments of the GNM matrix inverse and the FRI direct inverse 
of the diagonal elements, for a given generalized Kirchhoff 
matrix? To answer these questions, we select two represen¬ 
tative correlation functions, i.e., the Lorentz [v = 3) and ILF 
functions to construct the generalized Kirchhoff matrix (11). 
The Lorentz function is a typical example for many correla¬ 
tion functions studied in our earlier work.^^ In contrast, the 
ILF function is an extreme case of FRI correlation functions. 
The resulting two generalized Kirchhoff matrices (11) can be 
used for calculating the GNM matrix inverse or the inverse 
diagonal elements of the FRI matrix. This results in possi¬ 
ble combinations or methods, namely, FRI-Lorentz, FRI-ILF, 
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Figure 2: Illustration of protein 2Y7L. (a) Structure of protein 2Y7L 
having two domains; (b) Correlation map generated by using GNM- 
Lorentz indicating two domains; (c) Comparison of experimental B- 
factors and those predicted by GNM-Lorentz (77 = IbA); (d) Com¬ 
parison of experimental B-factors and those predicted by FRI-ILF 
(rc = 24A). 

GNM-Lorentz and GNM-ILF. Perfornnances of these meth¬ 
ods are carefully analyzed. 

To answer the above mentioned questions, we first em¬ 
ploy a pathogenic fungus Candida albicans (Protein Data 
Bank ID: 2Y7L) with 319 residues as shown in Fig. 2(a) to 
explore the aforementioned four methods. We consider the 

coarse-grained representation of protein 2Y7L. We de¬ 
note ^GNM—ILF ^FRI—ILF ^GNM-Lorentz ^FRI—Lorentz 

respectively the predicted B-factors of GNM-ILF, FRI-ILF, 
GNM-Lorentz and FRI-Lorentz methods. The experimen¬ 
tal B-factors from X-ray diffraction, are employed 

for a comparison. The Pearson product-moment correla¬ 
tion coefficient (PCC) is used to measure the strength of 
the linear relationship or dependence between each two sets 
of B-factors. To evaluate the performance of four methods, 
we compute the PCCs between predicted B-factors and ex¬ 
perimental B-factors. Since performance of these methods 
depends on their parameters, i.e., the cutoff distance (fc) in 
the ILF or the scale value ( 77 ) in the Lorentz function, the 
theoretical B-factors are computed over a wide range of Vc 
and T] values. 

Figure 3 depicts PCCs between various B-factors for pro¬ 
tein 2Y7L. As shown in Fig. 3 (a), the cutoff distance Vc 
of the ILF is varied from SA to 64A. The PCCs between 
^GNM-ILF gpj ^Exp^ gpj between ^FRI-ILF ^Exp^ 

indicate that both GNM-ILF and FRI-ILF are able to provide 
accurate predictions of the experimental B-factors. Their 
best predictions are attained around Tc = 24A, which is sig¬ 
nificantly larger than the commonly used GNM cutoff dis¬ 
tance of 7-9A, partially due to the fact that protein 2Y7L is 
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Figure 3: PCCs between various B-factors for protein 2Y7L. (a) Corre¬ 
lations between and between ^Exp^ 

and between Correlations between 

^GNM-Lorentz B^^P, between BFR^I-Lorentz g^^ B^^P, and be¬ 
tween and 

relatively large. 

It is interesting to observe that GNM-ILF and FRI-ILF pro¬ 
vide essentially identical predictions when the cutoff distance 
is equal to or larger than 20A. This phenomenon indicates 
that when the cutoff is sufficiently large, the diagonal ele¬ 
ments of the GNM inverse matrix and the direct inverse of 
the diagonal elements of the FRI correlation matrix become 
linearly dependent. To examine the relation between GNM- 
ILF and FRI-ILF, we compute PCCs between ^gnm-ile g^j 

^FRI-ILF 

over the same range of cutoff distances. As shown 
in Fig. 3(a), there is a strong linear dependence between 
^GNM-ILF g^j ^FRI-ILF understand this 

dependence at large cutoff distance, we analytically calculate 
ith diagonal element of the GNM inverse matrix 

^ oo)))., = ^ as N ^ oo ( 12 ) 

and the FRI inverse of the ith diagonal element 
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These results elucidate the strong asymptotic correlation be¬ 
tween ^GNM-ILF gpj ^FRI-ILF pjg 3 ^g^ They also 

explain why predictions of the original GNM and FRI-ILF de¬ 
teriorate as Tc is sufficiently large because all the predicted 
B-factors become identical, i.e., with N = 319. 

The performance and comparison between GNM-Lorentz 
and FRI-Lorentz are illustrated in Fig. 3(b) for the scale 
value 77 from O.SA to 64A. First, it is seen that the GNM- 
Lorentz is a successful new approach. In fact, it outperforms 
the original GNM for the peak PCCs. A comparison of the 
predicted B-factors and the experimental B-factors is plot¬ 
ted in Figs. 2(c) and 2(d) for GNM-Lorentz and FRI-ILF, 
respectively. It is seen that ^fri-ilf closely matches 
the experimental B-factors than ^CNM-Lorentz 
the different fitting schemes employed by two methods as 
shown in Eqs. (1) and (3), respectively. 

As shown in Fig. 3(b), the predictions from GNM-Lorentz 
and FRI-Lorentz become identical as 77 > SA. A strong corre¬ 
lation between ^GNM-Lorentz g^j ^FRI-Lorentz jg revealed 
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Figure 4: PCCs between various B-factors averaged over 364 proteins, 
(a) Correlations between s^nm-ilf ^Exp^ between 
and and between ^gnm-ilf ^fri-ilf. Correlations 

between ^GNM-Lorentz g^^ ^Exp^ between and B^^p, 

and between ^GNM-Lorentz g^^j ^FRI-Lorentz 


at an even smaller scale value. This behavior leads us to 
speculate a general relation 






77 ^ oc, (14) 


where c is a constant. Relation (14) means that the correla¬ 
tion function based GNM is equivalent to the FRI for a given 
admissible correlation function when the scale parameter is 
sufficiently large. This relation is certainly true for the ILF 
as analytically proved in Eqs. (12) and (13). Relation (14) 
is a very interesting and powerful result not only for sake 
of understanding GNM and FRI methods, but also for the 
design of accurate and efficient new methods. 

It remains to prove that the above findings from a sin¬ 
gle protein are translatable and verifiable to a large class 
of biomolecules. To this end, we consider a set of 364 pro¬ 
teins, which is a subset of the 365 proteins utilized and docu¬ 
mented in our earlier work.^^ The omitted protein is lAGN, 
which has been found to have unrealistic experimental B- 
factors. We carry out systematic studies of four methods 
over a rang of cutoff distances or scale values. For each 
given Vc or 77 , the PCCs between two sets of B-factors are 
averaged over 364 proteins. Figure 4 illustrates our results. 
Figure 4(a) plots the results of the ILF implemented in both 
GNM and FRI methods with the cutoff distance varied from 
4A to 23 A. Figure 4(b) depicts similar results obtained by us¬ 
ing the Lorentz function implemented in two methods. The 
scale value is explored over the range of O.SA to lOA. 

First, the proposed new method, GNM-Lorentz, is very 
successful for the B-factor prediction of 364 proteins as shown 
in Fig. 4(b). The best GNM-Lorentz prediction is about 
10.7% better than that of the original GNM shown in Fig. 
4(a). In fact, GNM-Lorentz outperforms the original GNM 
over a wide range of parameters for this set of proteins, which 
indicates that the proposed generalization is practically valu¬ 
able. Similarly, FRI-Lorentz is also about 10% more accu¬ 
rate than FRI-ILF in the B-factor prediction. Since the ILF 
is a special case and there are infinitely many FRI correla¬ 
tion functions, there is a wide variety of correlation function 
based GNMs that are expected to deliver more accurate flex¬ 
ibility analysis than the original GNM does. 


Additionally, the FRI-Lorentz method is able to attain the 
best average prediction for 364 proteins among four methods 
as shown in the zoomed in parts in Fig. 4(b). However, for 
a given correlation function, the difference between FRI and 
GNM predictions is very small. 

Moreover, for a given admissible FRI function, GNM and 
FRI B-factor predictions are strongly linearly correlated and 
reach near 100% correlation when Tc > OA or 77 > O.SA for 
364 proteins as demonstrated in Fig. 4. This finding offers a 
solid confirmation of Eq. (14). Therefore, correlation func¬ 
tion based GNMs, including the original GNM as a special 
case, are indeed equivalent to the corresponding FRI meth¬ 
ods in the flexibility analysis for a wide range of commonly 
used scale values. 

Furthermore, it has been shown that the fast FRI is a 
linear scaling method,while GNM scales as 0{N^) due 
to their matrix inverse procedure. As a result, the accumu¬ 
lated CPU times for the B-factor predictions of 364 proteins 
at rc = 7 or 77 = 3 are 0.88, 1.57, 5071.32 and 4934.79 
seconds respectively for the FRI-ILF, FRI-Lorentz, GNM-ILF 
and GNM-Lorentz. In fact, GNM methods are very fast for 
small proteins as well. Most of the accumulated GNM CPU 
times are due to the computation of three largest proteins 
(i.e., 1F8R, 1H6V and IQKI) in the test set. 

It is worth mentioning that that the earlier FRI rigidity 
index includes the contribution from the self correlation, 
i.e., the diagonal term.^^’^^ The present findings do not 
change if the summation in the generalized Kirchhoff ma¬ 
trix (11) is modified to include the diagonal term and then 
the calculation of GNM matrix inverse is modified to in¬ 
clude the contribution from first eigen mode, i.e., (T”^).. = 

[ukU^]... In fact, this modification makes the 
generalized Kirchhoff matrix less singular and fast converg¬ 
ing. 
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